Optimization and update system for deep learning models

ABSTRACT

Traditionally, a software application is developed, tested, and then published for use to end users. Any subsequent update made to the software application is generally in the form of a human programmed modification made to the code in the software application itself, and further only becomes usable once tested and published by developers and/or publishers, and installed by end users having the previous version of the software application. This typical software application lifecycle causes delays in not only generating improvements to software applications, but also to those improvements being made accessible to end users. To help avoid these delays and improve performance of software applications, deep learning models may be made accessible to the software applications for use in performing inferencing operations to generate inferenced data output for the software applications, which the software applications may then use as desired. These deep learning models can furthermore be improved independently of the software applications using manual and/or automated processes.

CLAIM OF PRIORITY

This application claims the benefit of U.S. Provisional Application No.62/717,735, titled “CONTINUOUS OPTIMIZATION AND UPDATE SYSTEM FOR DEEPLEARNING MODELS,” filed Aug. 10, 2018, the entire contents of which isincorporated herein by reference.

RELATED APPLICATIONS

This application is related to co-pending U.S. application Ser. No.______, titled “DEEP LEARNING MODEL EXECUTION USING TAGGED DATA”(Attorney Ref: NVIDP1276/18-SC-0202US01) filed Aug. ______, 2019, theentire contents of which is incorporated herein by reference.

This application is related to co-pending U.S. application Ser. No.______, titled “AUTOMATIC DATASET CREATION USING SOFTWARE TAGS”(Attorney Ref: NVIDP1277/18-SC-0197US01) and filed Aug. ______, 2019,the entire contents of which is incorporated herein by reference.

TECHNICAL FIELD

The present disclosure relates to deep learning used by softwareapplications.

BACKGROUND

Traditionally, a software application is developed, tested, and thenpublished for use to end users. Any subsequent update made to thesoftware application is generally in the form of a human programmedmodification made to the code in the software application itself, andfurther only becomes usable once tested and published by an applicationdeveloper and publisher, and installed by end users having the previousversion of the software application. This typical software applicationlifecycle causes delays in not only generating improvements to softwareapplications, but also to those improvements being made accessible toend users.

There is a need for addressing these issues and/or other issuesassociated with the prior art.

SUMMARY

A method, computer readable medium, and system are disclosed forimproving deep learning models that perform inferencing operations toprovide inferenced data to software applications. In an embodiment, adeep learning model usable for performing inferencing operations and forproviding inferenced data is stored. Additionally, the deep learningmodel is updated to create an updated version of the deep learningmodel. Further, the updated version of the deep learning model isdistributed to a client for use in providing the inferenced data.

In another embodiment, a deep learning model is stored. Additionally,the deep learning model is executed to perform inferencing operationsand to provide inferenced data to a software application. Further, anupdated version of the deep learning model is received. Still yet, theupdated version of the deep learning model is executed to provideadditional inferenced data to the software application.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates a block diagram of a system including a server thatprovisions a deep learning model to a client for use by a softwareapplication installed on the client, in accordance with an embodiment.

FIG. 2 illustrates a flowchart of a server method for improving a deeplearning model for use by a client, in accordance with an embodiment.

FIG. 3 illustrates a flowchart of a client method for implementing animproved deep learning model that provides inferenced data to a localsoftware application, in accordance with an embodiment.

FIG. 4A illustrates a block diagram of a system 400 for updating a deeplearning model that performs inferencing operations and providesinferenced data to a software application, in accordance with anembodiment.

FIG. 4B illustrates a flowchart of the method of the client of FIG. 4,in accordance with an embodiment.

FIG. 5A illustrates inference and/or training logic, according to atleast one embodiment;

FIG. 5B illustrates inference and/or training logic, according to atleast one embodiment;

FIG. 6 illustrates training and deployment of a neural network,according to at least one embodiment;

FIG. 7 illustrates an example data center system, according to at leastone embodiment;

DETAILED DESCRIPTION

FIG. 1 illustrates a block diagram of a system 100 including a server101 that provisions a deep learning model 102 to a client 103 for use bya software application 104 installed on the client 103, in accordancewith an embodiment.

With respect to the present description, the server 101 may be anycomputing device, virtualized computing device, or combination ofdevices, capable of communicating with the client 103 over a wired orwireless connection, for the purpose of provisioning the deep learningmodel 102 to the client 103 for use by a software application 104installed on the client 103. For example, the server 101 may include ahardware memory (e.g. random access memory (RAM), etc.) for storing thedeep learning model 102 and a hardware processor (e.g. centralprocessing unit (CPU), graphics processing unit (GPU), etc.) forprovisioning the deep learning model 102 from the memory to the client103 over the wired or wireless connection. The server 101 may provisionthe deep learning model 102 to the client 103 by sending a copy of thedeep learning model 102 over the wired or wireless connection to theclient 103.

Also with respect to the present description, the client 103 may be anycomputing device (including, without limitation, computing devices thatare wholly or partially virtualized) capable of communicating with theserver 101 over the wired or wireless connection, for the purpose ofreceiving from the server 101 the deep learning model 102 for use by thesoftware application 104 installed on the client 103. Thus, the client103 may not necessarily be an end-user device (e.g. personal computer,laptop, mobile phone, etc.) but may also be a server or othercloud-based computer system having the software application 104installed thereon. In the case where the client 103 is a cloud-basedcomputer system, output of the software application 104 may optionallybe streamed or otherwise communicated to an end-user device. Generally,the client 103 may include a memory for storing the deep learning model102 and a processor by which the software application 104 installed onthe client 103 uses the deep learning model 102 for obtaining inferenceddata. By storing a copy of the deep learning model 102 at the client(e.g. on a hard drive of the client), the client executes the deeplearning model 102 locally.

The deep learning model 102 is a machine learned network (e.g. deepneural network) that is trained to perform inferencing operations and toprovide inferenced data from input data. The deep learning model 102 maybe trained using supervised or unsupervised training techniques.Optionally, the server 101 may be used to perform the training of thedeep learning model 102, or may receive the already trained deeplearning model 102 from another device.

The deep learning model 102 may be trained for performing any desiredtype of inferencing and making any desired type of inferences. However,in the present embodiment, the deep learning model 102 outputsinferences that are usable by the software application 104 installed onthe client 103. It should be noted that the deep learning model 102 maysimilarly be used by other software applications which may be installedon the client 103 or other clients, and thus may not necessarily bespecifically trained for use by the software application 104 but insteadmay be trained more generically for use by multiple different softwareapplications. In any case, the deep learning model 102 may not be codedwithin the software application 104 itself, but may be accessible to thesoftware application 104 as external functionality (e.g. as a softwarepatch) via an application programming interface (API). As a result, thedeep learning model 102 may not necessarily be developed and provided bya same developer of the software application 104 but instead may bedeveloped and provided by a third-party developer.

In the present embodiment, the software application 104 installed on theclient 103 provides input data to the deep learning model 102 whichprocesses the input data to perform inferencing and/or to return one ormore inferences (i.e. inferenced data) for the input data. Accordingly,the deep learning model 102 is trained to process the input data andmake inferences therefrom. The inferenced data is output by the deeplearning model 102 to the software application 104 for use by functions,tasks, etc. of the software application 104.

There are various use cases for the system 100 described above. In oneembodiment, the software application 104 may be a video game, virtualreality application, image classification and other processing, sensordata analysis, or other graphics-related computer program. In thisembodiment, the deep learning model 102 may provide certainimage-related inferences, such as providing from an input image or otherinput data an anti-aliased image, an image with upscaled resolution, adenoised image, and/or any other output image that is modified in atleast one respect from the input image or other input data. As anotherexample, the deep learning model 102 may provide inference output thatcan be used to apply certain video-related effects, such as providingfrom input video or other input data a slow-motion version of the inputvideo or other input data, a super sampling of the input video or otherinput data, etc.

In another embodiment, the software application 104 may be a voicerecognition application or other audio-related computer program. In thisembodiment, the deep learning model 102 may provide inference outputthat can be used to apply certain audio-related effects, such asproviding from an input audio or other input data a languagetranslation, a voice recognized command, and/or any other output that isinferenced from the input audio or other input data.

The system 100 configuration described above enables improvements to bemade to the deep learning model 102 without necessarily requiring anychanges within the software application 104 itself. Thus, the softwareapplication 104 may inherently benefit from the improvements made to thedeep learning model 102, and thus an end-user or other system using thesoftware application 104 may benefit from the improvements made to thedeep learning model 102, without the tradeoff of the usual delaysassociated with updating the software application 104 itself. All thatmay be required is that the copy of the deep learning model 102 on theclient 103 be updated to the improved version.

For example, when the deep learning model 102 is improved to be faster,to be less computation-intensive, and/or to provide more accurateinferences, the software application 104 may inherently be improved byway of its use of the deep learning model 102 during execution thereof.For example, the software application 104 may likewise provide fasterresults, results with less computations, and/or more accurate results asa result of its use of the improved deep learning model 102.

The embodiments below describe systems and methods specifically forimproving deep learning models that provide inferenced data to softwareapplications. It should be noted that the systems and methods describedbelow may be implemented in the context of the system 100 of FIG. 1.

FIG. 2 illustrates a flowchart of a server method 200 for improving adeep learning model for use by a client, in accordance with anembodiment. Accordingly, in one embodiment, the method 200 may beperformed by the server 101 of FIG. 1.

In operation 201, a deep learning model is stored. In the context of thepresent method 200, the deep learning model is usable for performinginferencing operations and/or providing inferenced data to a softwareapplication (e.g. such as the deep learning model 102 used by thesoftware application 104 of FIG. 1). The deep learning model may bestored locally (e.g. by the server 101). In one embodiment, the deeplearning model may be stored in a local repository with other deeplearning models usable for performing inferencing operations and/orproviding other types of inferenced data to the software application orother software applications.

In operation 202, the deep learning model is updated to create animproved (updated) version of the deep learning model. It should benoted that any aspect(s) of the deep learning model may be updated tocreate the improved version of the deep learning model. In any casethough, the update to the deep learning model improves (e.g. optimizes)the deep learning model in at least one respect.

In particular, the deep learning model may be updated by retraining thedeep learning model and/or reconfiguring the deep learning model withnew parameters (e.g., weights) or hyperparameters. The updating may beperformed automatically by software and/or other neural networks. Thus,the process of updating the deep learning model to create an improvedversion thereof may be performed without requiring user intervention.

In one embodiment, as noted above, the deep learning model may beretrained, specifically using a changed dataset. For example, where thedeep learning model was last trained using a particular dataset, thedeep learning model may be retrained using a dataset that is changedfrom the particular dataset. The changed dataset may include additionaldata that was not included in the particular dataset that was last usedto train the deep learning model and/or may remove data that wasincluded in the particular dataset.

In another embodiment, as noted above, the deep learning model may beupdated with one or more reconfigurations being made to the deeplearning model. With respect to the option to reconfigure the deeplearning model, the deep learning model may be updated according to ahyperparameter adjustment. In the context of the present description, ahyperparameter refers to a parameter whose value is used to control thelearning process for the deep learning model (as opposed to the valuesof other parameters that are learned). For example, where the deeplearning model was last trained according to a particular hyperparameteror a particular combination of hyperparameters, the deep learning modelmay be retrained according to one or more hyperparameters that arechanged from the particular hyperparameter(s).

Further with respect to the option to reconfigure the deep learningmodel, the deep learning model may be updated with a layer substitution.For example, where the deep learning model included multiple particularlayers, the deep learning model may be updated to include, replace, etc.one or more layers that are different from the particular layers.Similarly, the deep learning model may be updated with layer fusing(e.g. combining two or more of the particular layers).

Also with respect to the option to reconfigure the deep learning model,the deep learning model may be updated to use input stacking. Forexample, particular inputs last used by the deep learning model may bechanged, such as by stacking inputs. The stacked inputs may be used toartificially increase the feature counts of tensors in the deep learningmodel. In other embodiments with respect to the option to reconfigurethe deep learning model, the deep learning model may be updated toinclude changed code, such as high-level code (at a software level), orlow level code (e.g. at a GPU level with GPU assembler code, or evenmachine code).

As noted above, any aspect(s) of the deep learning model, such as anycombination of the embodiments mentioned above, may be updated to createthe improved version of the deep learning model. As an option, theaspect(s) that are changed for updating the deep learning model may beselected automatically. For example, the aspect(s) may be iterativelychanged until the improved deep learning model is generated.

With respect to the present description, the deep learning model may beconsidered to be improved from the last (or any prior) version of thedeep learning model when any aspect, or any preselected aspect(s), ofthe deep learning model has improved, such as accuracy (e.g. ability toprovide more accurate inferences which may improve an end-userexperience), quality (e.g. quality of inferences), performance (e.g.improved speed, reduced resource consumption, etc.), etc. A version ofthe deep learning model resulting from any iteration of retraining maybe considered “improved” when any improvement benchmark, or anypreselected improvement benchmark(s), are met. The improvementbenchmarks may be predefined (e.g. manually), for example as thresholdsfor each category of improvement (i.e. accuracy, quality, and/orperformance) or even sub-category of improvement (e.g. improved speed,reduced resource consumption). As an option, improvement metrics may bemeasured when the updated deep learning model is executed by differentCPUs or GPUs, in which case the updated deep learning model may beconsidered “improved” for only those CPUs and/or GPUs that enabled theupdated deep learning model to meet the improvement benchmark(s).

In operation 203, a client with a previous version of the deep learningmodel is determined. The previous version of the deep learning model mayrefer to any version of the deep learning model generated prior to theupdated version of the deep learning model generated in operation 202.

Once the deep learning model is updated to create the improved versionof the deep learning model and the client with the previous version ofthe deep learning model is determined, the updated version of the deeplearning model is automatically distributed to the client when theupdated version of the deep learning model meets or exceeds one or moreimprovement benchmarks, as shown in operation 204). The client may beclient 103 of FIG. 1, for example. In the embodiment described abovewhere the updated deep learning model is considered “improved” for onlycertain CPUs and/or GPUs (i.e. that enabled the updated deep learningmodel to meet the improvement benchmark(s)), the improved version of thedeep learning model may only be distributed to the client when theclient includes one or more of those certain CPUs and/or GPUs. This mayhelp ensure that the client is configured to be able to realize theimprovements when executing the improved version of the deep learningmodel.

In one embodiment, the improved version of the deep learning model maybe distributed to the client by communicating a copy of the improvedversion of the deep learning model to the client. To this end, theclient may locally store, and thus locally execute, the copy of theimproved version of the deep learning model. It should be noted thatwhile the present method 200 references distributing the improvedversion of the deep learning model to a particular client, the method200 may be implemented in other embodiments to distribute the improvedversion of the deep learning model to multiple different clients (e.g.that each have a previous version of the deep learning model).

It should be further noted that the improved version of the deeplearning model may be distributed to the client responsive to aparticular trigger. In one embodiment, the trigger may be the creationof the improved version of the deep learning model. In anotherembodiment, the trigger may be a scheduled distribution. In yet anotherembodiment, the trigger may be a request received by the client for animproved version of the deep learning model (e.g. as described in moredetail below). When the server determines, responsive to the request,that it has a version of the deep learning model that has been updatedfrom a version currently stored on the client, the server may distributethe updated version of the deep learning model to the client.

To this end, the method 200 may be implemented for the deep learningmodel for creating an improved version of the deep learning model thatcan be used by the client to perform inferencing operations. Whereascurrent optimizations of deep learning models typically involve softwareengineers or data scientists conducting experiments to find bettersolutions, the present method 200 may allow the server to attempt hugenumbers of different possible combinations of changes to findimprovements. This method 200 may be repeated over and over to provideongoing and continuous deep learning model improvements that are thendownloaded to the client to improve operations involving the deeplearning model. Similarly, the method 200 may be implemented for otherdeep learning models to create improved versions of those deep learningmodels that can be used by any number of different clients to provideother types of inferenced data.

FIG. 3 illustrates a flowchart of a client method 300 for implementingan improved deep learning model that provides inferenced data to a localsoftware application, in accordance with an embodiment. In oneembodiment, the method 300 may be performed by the client 103 of FIG. 1.

In operation 301, a deep learning model is stored. In the context of thepresent method 300, the deep learning model is usable for providinginferenced data to a software application (e.g. such as the deeplearning model 102 used by the software application 104 of FIG. 1). Thedeep learning model may be stored locally (e.g. by the client 103). Inone embodiment, the deep learning model may be stored in a localrepository with other deep learning models usable for providing othertypes of inferenced data to the software application or other softwareapplications.

In operation 302, the deep learning model is executed to performinferencing operations and to provide inferenced data to a softwareapplication. The deep learning model and the software application mayboth execute locally. In particular, the software application providesinput data to the deep learning model which processes the input data togenerate one or more inferences (i.e. inferenced data) for the inputdata. The inferenced data is output by the deep learning model to thesoftware application for use by functions, tasks, etc. of the softwareapplication.

It should be noted that the software application may use the deeplearning model as often as required while the deep learning model isstored and is thus accessible to the software application. For example,various functions within the software application, or multipleexecutions of the same function, may cause input data to be provided tothe deep learning model for the purpose of obtaining the inferenceddata.

In operation 303, an updated version of the deep learning model isreceived. Thus, after some period in which the deep learning model isexecuted to provide inferenced data to a software application, theimproved version of the deep learning model may be received. In oneembodiment, the improved version of the deep learning model may bereceived by a server (e.g. server 101 of FIG. 1).

As an option, the improved version of the deep learning model may bereceived responsive to a trigger. In one embodiment, the trigger mayoccur on the server side, and thus the improved version of the deeplearning model may be provided to the client proactively. For example,the trigger may be the creation of the improved version of the deeplearning model at the server. As another example, the trigger may be ascheduled distribution at the server.

In another embodiment, the trigger may occur on the client side. Thetrigger may be scheduled, may be the initiated execution of the softwareapplication that uses the deep learning model, or may be a call to afeature API that causes execution of the deep learning model. Responsiveto the client-side trigger, the client may request from the server animproved version of the deep learning model. When the server determines,responsive to the request, that it has a version of the deep learningmodel that is updated from a version currently stored on the client, theserver may distribute the updated version of the deep learning model tothe client.

Further, in operation 304, the updated version of the deep learningmodel is executed to provide additional inferenced data to the softwareapplication. In one embodiment, the updated version of the deep learningmodel may replace the last version of the deep learning model used bythe software application (i.e. in operation 302). To this end, thesoftware application may use the updated version of the deep learningmodel once received by the client.

FIG. 4A illustrates a block diagram of a system 400 for updating a deeplearning model that performs inferencing operations and providesinferenced data to a software application, in accordance with anembodiment. It should be noted that the definitions and/or descriptionsprovided with respect to the embodiments above may equally apply to thepresent description.

As shown, a client 401 has installed thereon a software application 402that uses one or more deep learning models stored in a local deeplearning model store 403. Each of the deep learning models may perform adifferent type of inferences and provide a different type of inferenceddata, and thus may be usable (e.g. by the software application 402and/or other software applications installed on the client) to obtainany needed inferenced data.

Additionally, a server 409 operates to update a deep learning model tocreate an updated version of the deep learning model 410. As shown, theserver receives research data 404 which includes a new training dataset407 and/or a new deep learning model design 408 (reconfiguration). Theresearch data 404 may be generated from a newly generated public dataset405 and/or from offline information 406 received in association with thesoftware application.

The server 409 may update the deep learning model using manual trainingand tuning of the deep learning model by one or more users, and/or usingautomatic training and optimizing of the deep learning model by a neuralnetwork optimizer (not shown). The updated version of the deep learningmodel 410 is then distributed to the client 401 via a deep learningmodel update server 412 of a cloud service 411. Optionally, the client401 may subscribe to the cloud service 411 to be provided access to deeplearning models.

Each time the server 409 starts a new deep learning model trainingsession, the metadata that describes the deep learning model, includingall training hyperparameters, inferencing parameters, and the dataset,is stored either in a file or a database. This allows the deep learningmodel to be fully recreated at any time in the future. The server 409can also use that metadata to conduct future experiments and to derivenew deep learning models. At any point, the server 409 will likely havemultiple deep learning models being trained and evaluated againstimprovement benchmarks.

FIG. 4B illustrates a flowchart of the method of the client 401 of FIG.4, in accordance with an embodiment. As a first sub-process of themethod of the client 401, during runtime of the software application 402(operation 450), a feature API is invoked (operation 451). The featureAPI may provide an interface to the deep learning model to allow thesoftware application 402 to interface with the deep learning model.

Responsive to the invocation of the feature API, the client 401determines whether the deep learning model has been updated since a lastcall made to the deep learning model by the software application 402(decision 452). The client 401 may accomplish this by querying the localdeep learning model store 403 for a latest stored version of the deeplearning model.

Responsive to determining that the deep learning model has not beenupdated, the client 401 runs the deep learning model (operation 454)with input data provided by the software application 402, and returnsinferenced data output by the deep learning model back to the softwareapplication 402 (operation 455).

Responsive to determining that the deep learning model has been updated,the client 401 loads the updated (improved) deep learning model from thelocal deep learning model store 403 (operation 453). This may beperformed as a hot-swap (in real-time) during execution of the softwareapplication. The client 401 then runs the updated deep learning model(operation 454) with input data provided by the software application402, and returns inferenced data output by the deep learning model backto the software application 402 (operation 455).

As a second sub-process of the method of the client 401, the client 401is triggered (operation 456) to check for a deep learning model update(operation 457). The trigger may be caused by a schedule, or by thefeature API invocation in operation 451. The client 401 sends a requestto the cloud service 411 for any updated version of the deep learningmodel. When the cloud service 411 has access to an updated version ofthe deep learning model, the client 401 downloads the updated deeplearning model (operation 458) and stores the new model in the localdeep learning model store 403 (operation 459).

Machine Learning

Deep neural networks (DNNs), including deep learning models, developedon processors have been used for diverse use cases, from self-drivingcars to faster drug development, from automatic image captioning inonline image databases to smart real-time language translation in videochat applications. Deep learning is a technique that models the neurallearning process of the human brain, continually learning, continuallygetting smarter, and delivering more accurate results more quickly overtime. A child is initially taught by an adult to correctly identify andclassify various shapes, eventually being able to identify shapeswithout any coaching. Similarly, a deep learning or neural learningsystem needs to be trained in object recognition and classification forit get smarter and more efficient at identifying basic objects, occludedobjects, etc., while also assigning context to objects.

At the simplest level, neurons in the human brain look at various inputsthat are received, importance levels are assigned to each of theseinputs, and output is passed on to other neurons to act upon. Anartificial neuron or perceptron is the most basic model of a neuralnetwork. In one example, a perceptron may receive one or more inputsthat represent various features of an object that the perceptron isbeing trained to recognize and classify, and each of these features isassigned a certain weight based on the importance of that feature indefining the shape of an object.

A deep neural network (DNN) model includes multiple layers of manyconnected nodes (e.g., perceptrons, Boltzmann machines, radial basisfunctions, convolutional layers, etc.) that can be trained with enormousamounts of input data to quickly solve complex problems with highaccuracy. In one example, a first layer of the DNN model breaks down aninput image of an automobile into various sections and looks for basicpatterns such as lines and angles. The second layer assembles the linesto look for higher level patterns such as wheels, windshields, andmirrors. The next layer identifies the type of vehicle, and the finalfew layers generate a label for the input image, identifying the modelof a specific automobile brand.

Once the DNN is trained, the DNN can be deployed and used to identifyand classify objects or patterns in a process known as inference.Examples of inference (the process through which a DNN extracts usefulinformation from a given input) include identifying handwritten numberson checks deposited into ATM machines, identifying images of friends inphotos, delivering movie recommendations to over fifty million users,identifying and classifying different types of automobiles, pedestrians,and road hazards in driverless cars, or translating human speech inreal-time.

During training, data flows through the DNN in a forward propagationphase until a prediction is produced that indicates a labelcorresponding to the input. If the neural network does not correctlylabel the input, then errors between the correct label and the predictedlabel are analyzed, and the weights are adjusted for each feature duringa backward propagation phase until the DNN correctly labels the inputand other inputs in a training dataset. Training complex neural networksrequires massive amounts of parallel computing performance, includingfloating-point multiplications and additions. Inferencing is lesscompute-intensive than training, being a latency-sensitive process wherea trained neural network is applied to new inputs it has not seen beforeto classify images, translate speech, and generally infer newinformation.

Inference and Training Logic

As noted above, a deep learning or neural learning system needs to betrained to generate inferences from input data. Details regardinginference and/or training logic 515 for a deep learning or neurallearning system are provided below in conjunction with FIGS. 5A and/or5B.

In at least one embodiment, inference and/or training logic 515 mayinclude, without limitation, a data storage 501 to store forward and/oroutput weight and/or input/output data corresponding to neurons orlayers of a neural network trained and/or used for inferencing inaspects of one or more embodiments. In at least one embodiment datastorage 501 stores weight parameters and/or input/output data of eachlayer of a neural network trained or used in conjunction with one ormore embodiments during forward propagation of input/output data and/orweight parameters during training and/or inferencing using aspects ofone or more embodiments. In at least one embodiment, any portion of datastorage 501 may be included with other on-chip or off-chip data storage,including a processor's L1, L2, or L3 cache or system memory.

In at least one embodiment, any portion of data storage 501 may beinternal or external to one or more processors or other hardware logicdevices or circuits. In at least one embodiment, data storage 501 may becache memory, dynamic randomly addressable memory (“DRAM”), staticrandomly addressable memory (“SRAM”), non-volatile memory (e.g., Flashmemory), or other storage. In at least one embodiment, choice of whetherdata storage 501 is internal or external to a processor, for example, orcomprised of DRAM, SRAM, Flash or some other storage type may depend onavailable storage on-chip versus off-chip, latency requirements oftraining and/or inferencing functions being performed, batch size ofdata used in inferencing and/or training of a neural network, or somecombination of these factors.

In at least one embodiment, inference and/or training logic 515 mayinclude, without limitation, a data storage 505 to store backward and/oroutput weight and/or input/output data corresponding to neurons orlayers of a neural network trained and/or used for inferencing inaspects of one or more embodiments. In at least one embodiment, datastorage 505 stores weight parameters and/or input/output data of eachlayer of a neural network trained or used in conjunction with one ormore embodiments during backward propagation of input/output data and/orweight parameters during training and/or inferencing using aspects ofone or more embodiments. In at least one embodiment, any portion of datastorage 505 may be included with other on-chip or off-chip data storage,including a processor's L1, L2, or L3 cache or system memory. In atleast one embodiment, any portion of data storage 505 may be internal orexternal to on one or more processors or other hardware logic devices orcircuits. In at least one embodiment, data storage 505 may be cachememory, DRAM, SRAM, non-volatile memory (e.g., Flash memory), or otherstorage. In at least one embodiment, choice of whether data storage 505is internal or external to a processor, for example, or comprised ofDRAM, SRAM, Flash or some other storage type may depend on availablestorage on-chip versus off-chip, latency requirements of training and/orinferencing functions being performed, batch size of data used ininferencing and/or training of a neural network, or some combination ofthese factors.

In at least one embodiment, data storage 501 and data storage 505 may beseparate storage structures. In at least one embodiment, data storage501 and data storage 505 may be same storage structure. In at least oneembodiment, data storage 501 and data storage 505 may be partially samestorage structure and partially separate storage structures. In at leastone embodiment, any portion of data storage 501 and data storage 505 maybe included with other on-chip or off-chip data storage, including aprocessor's L1, L2, or L3 cache or system memory.

In at least one embodiment, inference and/or training logic 515 mayinclude, without limitation, one or more arithmetic logic unit(s)(“ALU(s)”) 510 to perform logical and/or mathematical operations based,at least in part on, or indicated by, training and/or inference code,result of which may result in activations (e.g., output values fromlayers or neurons within a neural network) stored in an activationstorage 520 that are functions of input/output and/or weight parameterdata stored in data storage 501 and/or data storage 505. In at least oneembodiment, activations stored in activation storage 520 are generatedaccording to linear algebraic and or matrix-based mathematics performedby ALU(s) 510 in response to performing instructions or other code,wherein weight values stored in data storage 505 and/or data 501 areused as operands along with other values, such as bias values, gradientinformation, momentum values, or other parameters or hyperparameters,any or all of which may be stored in data storage 505 or data storage501 or another storage on or off-chip. In at least one embodiment,ALU(s) 510 are included within one or more processors or other hardwarelogic devices or circuits, whereas in another embodiment, ALU(s) 510 maybe external to a processor or other hardware logic device or circuitthat uses them (e.g., a co-processor). In at least one embodiment, ALUs510 may be included within a processor's execution units or otherwisewithin a bank of ALUs accessible by a processor's execution units eitherwithin same processor or distributed between different processors ofdifferent types (e.g., central processing units, graphics processingunits, fixed function units, etc.). In at least one embodiment, datastorage 501, data storage 505, and activation storage 520 may be on sameprocessor or other hardware logic device or circuit, whereas in anotherembodiment, they may be in different processors or other hardware logicdevices or circuits, or some combination of same and differentprocessors or other hardware logic devices or circuits. In at least oneembodiment, any portion of activation storage 520 may be included withother on-chip or off-chip data storage, including a processor's L1, L2,or L3 cache or system memory. Furthermore, inferencing and/or trainingcode may be stored with other code accessible to a processor or otherhardware logic or circuit and fetched and/or processed using aprocessor's fetch, decode, scheduling, execution, retirement and/orother logical circuits.

In at least one embodiment, activation storage 520 may be cache memory,DRAM, SRAM, non-volatile memory (e.g., Flash memory), or other storage.In at least one embodiment, activation storage 520 may be completely orpartially within or external to one or more processors or other logicalcircuits. In at least one embodiment, choice of whether activationstorage 520 is internal or external to a processor, for example, orcomprised of DRAM, SRAM, Flash or some other storage type may depend onavailable storage on-chip versus off-chip, latency requirements oftraining and/or inferencing functions being performed, batch size ofdata used in inferencing and/or training of a neural network, or somecombination of these factors. In at least one embodiment, inferenceand/or training logic 515 illustrated in FIG. 5A may be used inconjunction with an application-specific integrated circuit (“ASIC”),such as Tensorflow® Processing Unit from Google, an inference processingunit (IPU) from Graphcore™, or a Nervana® (e.g., “Lake Crest”) processorfrom Intel Corp. In at least one embodiment, inference and/or traininglogic 515 illustrated in FIG. 5.A may be used in conjunction withcentral processing unit (“CPU”) hardware, graphics processing unit(“GPU”) hardware or other hardware, such as field programmable gatearrays (“FPGAs”).

FIG. 5B illustrates inference and/or training logic 515, according to atleast one embodiment. In at least one embodiment, inference and/ortraining logic 515 may include, without limitation, hardware logic inwhich computational resources are dedicated or otherwise exclusivelyused in conjunction with weight values or other informationcorresponding to one or more layers of neurons within a neural network.In at least one embodiment, inference and/or training logic 515illustrated in FIG. 5.B may be used in conjunction with anapplication-specific integrated circuit (ASIC), such as Tensorflow®Processing Unit from Google, an inference processing unit (IPU) fromGraphcore™, or a Nervana® (e.g., “Lake Crest”) processor from IntelCorp. In at least one embodiment, inference and/or training logic 515illustrated in FIG. 5.B may be used in conjunction with centralprocessing unit (CPU) hardware, graphics processing unit (GPU) hardwareor other hardware, such as field programmable gate arrays (FPGAs). In atleast one embodiment, inference and/or training logic 515 includes,without limitation, data storage 501 and data storage 505, which may beused to store weight values and/or other information, including biasvalues, gradient information, momentum values, and/or other parameter orhyperparameter information. In at least one embodiment illustrated inFIG. 5B, each of data storage 501 and data storage 505 is associatedwith a dedicated computational resource, such as computational hardware502 and computational hardware 506, respectively. In at least oneembodiment, each of computational hardware 506 comprises one or moreALUs that perform mathematical functions, such as linear algebraicfunctions, only on information stored in data storage 501 and datastorage 505, respectively, result of which is stored in activationstorage 520.

In at least one embodiment, each of data storage 501 and 505 andcorresponding computational hardware 502 and 506, respectively,correspond to different layers of a neural network, such that resultingactivation from one “storage/computational pair 501/502” of data storage501 and computational hardware 502 is provided as an input to next“storage/computational pair 505/506” of data storage 505 andcomputational hardware 506, in order to mirror conceptual organizationof a neural network. In at least one embodiment, each ofstorage/computational pairs 501/502 and 505/506 may correspond to morethan one neural network layer. In at least one embodiment, additionalstorage/computation pairs (not shown) subsequent to or in parallel withstorage computation pairs 501/502 and 505/506 may be included ininference and/or training logic 515.

Neural Network Training and Deployment

FIG. 6 illustrates another embodiment for training and deployment of adeep neural network. In at least one embodiment, untrained neuralnetwork 606 is trained using a training dataset 602. In at least oneembodiment, training framework 604 is a PyTorch framework, whereas inother embodiments, training framework 604 is a Tensorflow, Boost, Caffe,Microsoft Cognitive Toolkit/CNTK, MXNet, Chainer, Keras, Deeplearning4j,or other training framework. In at least one embodiment trainingframework 604 trains an untrained neural network 606 and enables it tobe trained using processing resources described herein to generate atrained neural network 608. In at least one embodiment, weights may bechosen randomly or by pre-training using a deep belief network. In atleast one embodiment, training may be performed in either a supervised,partially supervised, or unsupervised manner.

In at least one embodiment, untrained neural network 606 is trainedusing supervised learning, wherein training dataset 602 includes aninput paired with a desired output for an input, or where trainingdataset 602 includes input having known output and the output of theneural network is manually graded. In at least one embodiment, untrainedneural network 606 is trained in a supervised manner processes inputsfrom training dataset 602 and compares resulting outputs against a setof expected or desired outputs. In at least one embodiment, errors arethen propagated back through untrained neural network 606. In at leastone embodiment, training framework 604 adjusts weights that controluntrained neural network 606. In at least one embodiment, trainingframework 604 includes tools to monitor how well untrained neuralnetwork 606 is converging towards a model, such as trained neuralnetwork 608, suitable to generating correct answers, such as in result614, based on known input data, such as new data 612. In at least oneembodiment, training framework 604 trains untrained neural network 606repeatedly while adjust weights to refine an output of untrained neuralnetwork 606 using a loss function and adjustment algorithm, such asstochastic gradient descent. In at least one embodiment, trainingframework 604 trains untrained neural network 606 until untrained neuralnetwork 606 achieves a desired accuracy. In at least one embodiment,trained neural network 608 can then be deployed to implement any numberof machine learning operations.

In at least one embodiment, untrained neural network 606 is trainedusing unsupervised learning, wherein untrained neural network 606attempts to train itself using unlabeled data. In at least oneembodiment, unsupervised learning training dataset 602 will includeinput data without any associated output data or “ground truth” data. Inat least one embodiment, untrained neural network 606 can learngroupings within training dataset 602 and can determine how individualinputs are related to untrained dataset 602. In at least one embodiment,unsupervised training can be used to generate a self-organizing map,which is a type of trained neural network 608 capable of performingoperations useful in reducing dimensionality of new data 612. In atleast one embodiment, unsupervised training can also be used to performanomaly detection, which allows identification of data points in a newdataset 612 that deviate from normal patterns of new dataset 612.

In at least one embodiment, semi-supervised learning may be used, whichis a technique in which in training dataset 602 includes a mix oflabeled and unlabeled data. In at least one embodiment, trainingframework 604 may be used to perform incremental learning, such asthrough transferred learning techniques. In at least one embodiment,incremental learning enables trained neural network 608 to adapt to newdata 612 without forgetting knowledge instilled within network duringinitial training.

Data Center

FIG. 7 illustrates an example data center 700, in which at least oneembodiment may be used. In at least one embodiment, data center 700includes a data center infrastructure layer 710, a framework layer 720,a software layer 730 and an application layer 740.

In at least one embodiment, as shown in FIG. 7, data centerinfrastructure layer 710 may include a resource orchestrator 712,grouped computing resources 714, and node computing resources (“nodeC.R.s”) 716(1)-716(N), where “N” represents any whole, positive integer.In at least one embodiment, node C.R.s 716(1)-716(N) may include, butare not limited to, any number of central processing units (“CPUs”) orother processors (including accelerators, field programmable gate arrays(FPGAs), graphics processors, etc.), memory devices (e.g., dynamicread-only memory), storage devices (e.g., solid state or disk drives),network input/output (“NW I/O”) devices, network switches, virtualmachines (“VMs”), power modules, and cooling modules, etc. In at leastone embodiment, one or more node C.R.s from among node C.R.s716(1)-716(N) may be a server having one or more of above-mentionedcomputing resources.

In at least one embodiment, grouped computing resources 714 may includeseparate groupings of node C.R.s housed within one or more racks (notshown), or many racks housed in data centers at various geographicallocations (also not shown). separate groupings of node C.R.s withingrouped computing resources 714 may include grouped compute, network,memory or storage resources that may be configured or allocated tosupport one or more workloads. In at least one embodiment, several nodeC.R.s including CPUs or processors may grouped within one or more racksto provide compute resources to support one or more workloads. In atleast one embodiment, one or more racks may also include any number ofpower modules, cooling modules, and network switches, in anycombination.

In at least one embodiment, resource orchestrator 722 may configure orotherwise control one or more node C.R.s 716(1)-716(N) and/or groupedcomputing resources 714. In at least one embodiment, resourceorchestrator 722 may include a software design infrastructure (“SDI”)management entity for data center 700. In at least one embodiment,resource orchestrator may include hardware, software or some combinationthereof.

In at least one embodiment, as shown in FIG. 7, framework layer 720includes a job scheduler 732, a configuration manager 734, a resourcemanager 736 and a distributed file system 738. In at least oneembodiment, framework layer 720 may include a framework to supportsoftware 732 of software layer 730 and/or one or more application(s) 742of application layer 740. In at least one embodiment, software 732 orapplication(s) 742 may respectively include web-based service softwareor applications, such as those provided by Amazon Web Services, GoogleCloud and Microsoft Azure. In at least one embodiment, framework layer720 may be, but is not limited to, a type of free and open-sourcesoftware web application framework such as Apache Spark™ (hereinafter“Spark”) that may utilize distributed file system 738 for large-scaledata processing (e.g., “big data”). In at least one embodiment, jobscheduler 732 may include a Spark driver to facilitate scheduling ofworkloads supported by various layers of data center 700. In at leastone embodiment, configuration manager 734 may be capable of configuringdifferent layers such as software layer 730 and framework layer 720including Spark and distributed file system 738 for supportinglarge-scale data processing. In at least one embodiment, resourcemanager 736 may be capable of managing clustered or grouped computingresources mapped to or allocated for support of distributed file system738 and job scheduler 732. In at least one embodiment, clustered orgrouped computing resources may include grouped computing resource 714at data center infrastructure layer 710. In at least one embodiment,resource manager 736 may coordinate with resource orchestrator 712 tomanage these mapped or allocated computing resources.

In at least one embodiment, software 732 included in software layer 730may include software used by at least portions of node C.R.s716(1)-716(N), grouped computing resources 714, and/or distributed filesystem 738 of framework layer 720. one or more types of software mayinclude, but are not limited to, Internet web page search software,e-mail virus scan software, database software, and streaming videocontent software.

In at least one embodiment, application(s) 742 included in applicationlayer 740 may include one or more types of applications used by at leastportions of node C.R.s 716(1)-716(N), grouped computing resources 714,and/or distributed file system 738 of framework layer 720. one or moretypes of applications may include, but are not limited to, any number ofa genomics application, a cognitive compute, and a machine learningapplication, including training or inferencing software, machinelearning framework software (e.g., PyTorch, TensorFlow, Caffe, etc.) orother machine learning applications used in conjunction with one or moreembodiments.

In at least one embodiment, any of configuration manager 734, resourcemanager 736, and resource orchestrator 712 may implement any number andtype of self-modifying actions based on any amount and type of dataacquired in any technically feasible fashion. In at least oneembodiment, self-modifying actions may relieve a data center operator ofdata center 700 from making possibly bad configuration decisions andpossibly avoiding underutilized and/or poor performing portions of adata center.

In at least one embodiment, data center 700 may include tools, services,software or other resources to train one or more machine learning modelsor predict or infer information using one or more machine learningmodels according to one or more embodiments described herein. Forexample, in at least one embodiment, a machine learning model may betrained by calculating weight parameters according to a neural networkarchitecture using software and computing resources described above withrespect to data center 700. In at least one embodiment, trained machinelearning models corresponding to one or more neural networks may be usedto infer or predict information using resources described above withrespect to data center 700 by using weight parameters calculated throughone or more training techniques described herein.

In at least one embodiment, data center may use CPUs,application-specific integrated circuits (ASICs), GPUs, FPGAs, or otherhardware to perform training and/or inferencing using above-describedresources. Moreover, one or more software and/or hardware resourcesdescribed above may be configured as a service to allow users to trainor performing inferencing of information, such as image recognition,speech recognition, or other artificial intelligence services.

Inference and/or training logic 515 are used to perform inferencingand/or training operations associated with one or more embodiments. Inat least one embodiment, inference and/or training logic 515 may be usedin system FIG. 7 for inferencing or predicting operations based, atleast in part, on weight parameters calculated using neural networktraining operations, neural network functions and/or architectures, orneural network use cases described herein.

As described herein, a method, computer readable medium, and system aredisclosed for improving deep learning models that perform inferencingoperations to provide inferenced data to software applications. Inaccordance with FIGS. 1-4B, an embodiment may provide a deep learningmodel usable for performing inferencing operations and for providinginferenced data, where the deep learning model is stored (partially orwholly) in one or both of data storage 501 and 505 in inference and/ortraining logic 515 as depicted in FIGS. 5A and 5B. Training anddeployment of the deep learning model may be performed as depicted inFIG. 6 and described herein. For example, the deep learning model, whenuntrained, may subsequently be trained using training framework 604.Additionally, the deep learning model, when previously trained, may beupdated to create an updated version of the deep learning model alsousing framework 604. Further, the updated version of a deep learningmodel may be distributed to a client for use in providing the inferenceddata. Distribution of the trained or re-trained deep learning model maybe performed using one or more servers in a data center 700 as depictedin FIG. 7 and described herein.

What is claimed is:
 1. A method, comprising: storing a deep learningmodel usable to perform inferencing operations and for providinginferenced data; receiving one or more updates to the deep learningmodel; updating the deep learning model to create an updated version ofthe deep learning model; determining a client computing device with aprevious version of the deep learning model; and automaticallydistributing one or more updates to the deep learning model to theclient computing device when the updated version of the deep learningmodel meets or exceeds one or more improvement benchmarks.
 2. The methodof claim 1, wherein the deep learning model is usable to perform theinferencing operations and to output the inferenced data to a softwareapplication installed on the client computing device.
 3. The method ofclaim 2, wherein the software application is a video game.
 4. The methodof claim 1, wherein the updating the deep learning model comprisesretraining the deep learning model.
 5. The method of claim 5, whereinthe retraining the deep learning model comprises retraining the deeplearning model using a changed dataset.
 6. The method of claim 1,wherein the automatically distributing the one or more updates to thedeep learning model comprises distributing one or more updatedparameters for the previous version of the deep learning model to theclient computing device.
 7. The method of claim 6, wherein theautomatically distributing the one or more updates to the deep learningmodel comprises distributing one or more updated hyperparameteradjustments for the previous version of the deep learning model to theclient computing device.
 8. The method of claim 6, wherein theautomatically distributing the one or more updates to the deep learningmodel comprises updating the previous version of the deep learning modelof the client computing device with at least one of a layersubstitution, a layer fusing, or input stacking.
 9. The method of claim1, wherein the updating is performed automatically by software or otherneural networks.
 10. The method of claim 1, wherein the one or moreimprovement benchmarks are one or more thresholds for improvementrelating to accuracy, quality, or performance.
 11. A non-transitorycomputer-readable medium storing computer instructions that, whenexecuted by one or more processors, cause the one or more processors toperform a method comprising: storing a deep learning model; executingthe deep learning model to perform inferencing operations and to provideinferenced data to a software application; receiving an updated versionof the deep learning model; and executing the updated version of thedeep learning model to provide additional inferenced data to thesoftware application.
 12. The non-transitory computer-readable medium ofclaim 11, wherein the deep learning model and the software applicationare installed on a client computing device.
 13. The non-transitorycomputer-readable medium of claim 11, wherein the software applicationis a video game.
 14. The non-transitory computer-readable medium ofclaim 13, wherein the inferenced data includes one or more image-relatedinferences, the one or more image-related inferences being at least oneof an anti-aliased image, an image with upscaled resolution, or adenoised image.
 15. The non-transitory computer-readable medium of claim11, wherein the software application is a voice recognition application.16. The non-transitory computer-readable medium of claim 15, wherein theinferenced data includes one or more audio-related inferences, the oneor more audio-related inferences being at least one of a languagetranslation or a voice recognized command.
 17. The non-transitorycomputer-readable medium of claim 11, wherein executing the deeplearning model to perform inferencing operations and to provideinferenced data to a software application includes: providing input databy the software application to the deep learning model, processing theinput data by the deep learning model to generate the inferenced data,and outputting the inferenced data to the software application.
 18. Thenon-transitory computer-readable medium of claim 11, wherein the updatedversion of the deep learning model is received responsive to a trigger,the trigger being a request for the updated version of the deep learningmodel being sent to a server.
 19. The non-transitory computer-readablemedium of claim 18, wherein the request is triggered as a result of acall to a feature application programming interface (API) that causesexecution of the deep learning model.
 20. A system, comprising: a memorystoring instructions; and one or more processors that execute theinstructions to perform a method comprising: receiving a deep learningmodel from a server; storing the deep learning model; executing the deeplearning model to perform inferencing operations and to provideinferenced data to a software application; sending a request to theserver to determine whether an updated version of the deep learningmodel is available; receiving the updated version of the deep learningmodel when an updated version of the deep learning model is availablethat meets or exceeds one or more improvement thresholds; and executingthe updated version of the deep learning model to provide additionalinferenced data to the software application.